THIS WORK IS YET TO BE REVIEWED
This paper contains estimates for the reproductive number \(R_{t,m}\) over time \(t\) in various countries \(m\). It also shows the same for provinces within South Africa.. This is done using the methodology as described in [1]. These have been implemented in R using EpiEstim package [2] which is what is used here.
This paper and it’s results should be updated roughly daily and is available online.
As this paper is updated over time this section will summarise significant changes. The code producing this paper is tracked using Git. The Git commit hash for this project at the time of generating this paper was 6cbf8f93c9af4946f466d49f7450d7c3349934a9.
2020-06-12
2020-06-13
The project uses the following libraries.
require(EpiEstim)
require(EnvStats)
require(ggplot2)
require(ggpubr)
require(lubridate)
require(utils)
require(httr)
require(dplyr)
require(tidyr)
require(scales)
All countries available in the data will be analysed but also the provinces of South Africa. This paper produces charts for the following countries only:
country_codes = c("BR", # Brazil
"CA", # Canada
"CL", # Chile
"IN", # India
"IE", # Ireland
"IT", # Italy
"PE", # Peru
"ZA", # South Africa
"ES", # Spain
"UK" # United Kingdom
)
Data is downloaded from the Git repository associated with [3].
# Provincial Deaths
GET(url = "https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_deaths.csv", write_disk("covid19za_provincial_cumulative_timeline_deaths.csv",
overwrite = TRUE))
Response [https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_deaths.csv]
Date: 2020-06-14 13:59
Status: 200
Content-Type: text/plain; charset=utf-8
Size: 7.41 kB
<ON DISK> C:\Users\lrossou\Desktop\COVID-19\rt_estimates\covid19za_provincial_cumulative_timeline_deaths.csv
# Provincial Cases
GET(url = "https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_confirmed.csv", write_disk("covid19za_provincial_cumulative_timeline_confirmed.csv",
overwrite = TRUE))
Response [https://raw.githubusercontent.com/dsfsi/covid19za/master/data/covid19za_provincial_cumulative_timeline_confirmed.csv]
Date: 2020-06-14 13:59
Status: 200
Content-Type: text/plain; charset=utf-8
Size: 9.33 kB
<ON DISK> C:\Users\lrossou\Desktop\COVID-19\rt_estimates\covid19za_provincial_cumulative_timeline_confirmed.csv
First read in the data from the downloaded comma-separated values text file.
# Read from CSVs above
data_za_cases <- read.csv("covid19za_provincial_cumulative_timeline_confirmed.csv", stringsAsFactors = FALSE)
data_za_deaths <- read.csv("covid19za_provincial_cumulative_timeline_deaths.csv", stringsAsFactors = FALSE)
In the case data file row 21 and 32 contain no provincial details. We estimated it by spreading the national total to the provinces in proportion to a mixture of the prior day and the next day.
data_za_cases[21, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC", "UNKNOWN")] <- colSums(data_za_cases[c(20, 22), c("EC", "FS", "GP", "KZN",
"LP", "MP", "NC", "NW", "WC", "UNKNOWN")])/sum(data_za_cases[c(20, 22), ]$total) * data_za_cases[21, ]$total
data_za_cases[32, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC", "UNKNOWN")] <- colSums(data_za_cases[c(31, 33), c("EC", "FS", "GP", "KZN",
"LP", "MP", "NC", "NW", "WC", "UNKNOWN")])/sum(data_za_cases[c(31, 33), ]$total) * data_za_cases[32, ]$total
The following function will be applied to case and death data. In this function the following occurs:
SA column is added as the sum of the new per province data.fix_data_za <- function(data_za, start_date = as.Date("2020-03-01"), end_date = as.Date("2020-03-31")) {
# Scale provinces by scale factor (assume unknown are in proportion)
data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")] <- data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")] * (1 +
data_za$UNKNOWN/rowSums(data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")]))
# Only select columns we need
data_za <- data_za %>% select("date", "EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")
data_za$date <- as.Date(data_za$date, "%d-%m-%Y")
# Round data so we have integer cases
data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")] <- round(data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")],
0)
# Calculate a new SA column
data_za$SA <- rowSums(data_za[, c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC")])
# 'Melt' the data
data_za <- pivot_longer(data_za, cols = c("EC", "FS", "GP", "KZN", "LP", "MP", "NC", "NW", "WC", "SA"), names_to = "province", values_to = "count")
# Getting daily data from the cumulative data set
data_za = data_za %>% group_by(province) %>% arrange(date) %>% mutate(count = count - lag(count, default = 0)) %>% ungroup()
# add missing dates
all_dates <- expand_grid(date = seq(start_date, end_date, 1), province = levels(as.factor(data_za$province)))
# join
data_za <- left_join(all_dates, data_za, by = c("date", "province"))
# province factor
data_za$province <- as.factor(data_za$province)
# 0 for NAs
data_za$count <- ifelse(is.na(data_za$count), 0, data_za$count)
# remove negatives
data_za$count <- ifelse(data_za$count < 0, 0, data_za$count)
# get cumulative counts
data_za <- data_za %>% group_by(province) %>% mutate(cumulative_count = cumsum(count)) %>% ungroup()
return(data_za)
}
Below we use the function above to process deaths and cases and then combine them into a single dataset.
start_date <- min(as.Date(data_za_cases$date, "%d-%m-%Y"))
end_date <- max(as.Date(data_za_cases$date, "%d-%m-%Y"))
data_za_cases <- fix_data_za(data_za_cases, start_date = start_date, end_date = end_date)
data_za_deaths <- fix_data_za(data_za_deaths, start_date = start_date, end_date = end_date)
data_za_cases <- cbind("cases", data_za_cases)
data_za_deaths <- cbind("deaths", data_za_deaths)
colnames(data_za_cases)[1] <- "type"
colnames(data_za_deaths)[1] <- "type"
# combined
data_za <- rbind(data_za_cases, data_za_deaths)
# remove data sets no longer needed
rm("data_za_cases", "data_za_deaths", "start_date", "end_date", "fix_data_za")
Data for other countries are downloaded from [4].
GET(url = "https://opendata.ecdc.europa.eu/covid19/casedistribution/csv", write_disk("ecdc_casedistribution.csv", overwrite = TRUE))
Response [https://opendata.ecdc.europa.eu/covid19/casedistribution/csv/]
Date: 2020-06-14 13:59
Status: 200
Content-Type: application/octet-stream
Size: 1.37 MB
<ON DISK> C:\Users\lrossou\Desktop\COVID-19\rt_estimates\ecdc_casedistribution.csv
First read in the data from the downloaded comma-separated values text file.
# Read from CSVs above
data <- read.csv("ecdc_casedistribution.csv", stringsAsFactors = FALSE)
Update data by doing the following:
# get date
data$date <- as.Date(data$dateRep, format = "%d/%m/%Y")
# get cases and deaths
data <- data %>% select(continentExp, date, geoId, countriesAndTerritories, cases, deaths)
# column names
colnames(data) <- c("continent", "date", "country_code", "country", "cases", "deaths")
# add 0 data for dates that were skipped
for (c in levels(as.factor(data$country))) {
c_dates <- (data %>% filter(country == c))$date
check_dates <- seq(min(c_dates), max(c_dates), by = 1)
missing_dates <- check_dates[!check_dates %in% c_dates]
continent <- (data %>% filter(country == c))$continent[1]
country_code <- (data %>% filter(country == c))$country_code[1]
if (length(missing_dates) > 0) {
missing_data <- data.frame(continent = continent, date = missing_dates, country_code = country_code, country = rep(c, length(missing_dates)),
cases = rep(0, length(missing_dates)), deaths = rep(0, length(missing_dates)))
data <- rbind(data, missing_data)
}
}
# 'Melt' the data
data <- pivot_longer(data, cols = c("cases", "deaths"), names_to = "type", values_to = "count")
# fields as factors
data$continent <- as.factor(data$continent)
data$country_code <- as.factor(data$country_code)
data$country <- as.factor(data$country)
data$type <- as.factor(data$type)
# make sure it is a data frame
data <- as.data.frame(data)
# zeroise counts below 0
data$count <- ifelse(data$count < 0, 0, data$count)
# order and get group by
data <- data %>% group_by(country_code, country, type) %>% arrange(country_code, country, type, date) %>% mutate(cumulative_count = cumsum(count)) %>%
ungroup()
Google released mobility data indexes that deviations from a baseline of movement during 3 January to 6 February 2020 [5]. This data is downloaded and prepared for linking to our estimates of \(R_{t,m}\).
The data is made available as a comma-separated file which is updated regularly but not daily. Data is sometimes up to a week behind.
GET(url = "https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv", write_disk("google_mobility_data.csv", overwrite = TRUE))
Response [https://www.gstatic.com/covid19/mobility/Global_Mobility_Report.csv]
Date: 2020-06-14 13:59
Status: 200
Content-Type: text/csv
Size: 35.6 MB
<ON DISK> C:\Users\lrossou\Desktop\COVID-19\rt_estimates\google_mobility_data.csv
First read in the file:
# Read from CSVs above
google_mobility_data <- read.csv("google_mobility_data.csv", stringsAsFactors = FALSE, na.strings = "")
Below the data is prepared by doing the following:
# fix dates
google_mobility_data$date <- as.Date(google_mobility_data$date, format = "%Y-%m-%d")
# keep only country level data but keep all data for ZA
google_mobility_data <- google_mobility_data %>% filter(country_region_code == "ZA" | is.na(sub_region_1))
# pick only the fields required
google_mobility_data <- google_mobility_data %>% select(country_region_code, country_region, sub_region_1, iso_3166_2_code, date, retail_and_recreation_percent_change_from_baseline,
grocery_and_pharmacy_percent_change_from_baseline, parks_percent_change_from_baseline, transit_stations_percent_change_from_baseline, workplaces_percent_change_from_baseline,
residential_percent_change_from_baseline)
# rename the fields
colnames(google_mobility_data) <- c("country_code", "country", "province", "province_code", "date", "retail_and_recreation", "grocery_and_pharmacy",
"parks", "transit_stations", "workplaces", "residential")
# move province data to country for ease later
google_mobility_data$country_code[!is.na(google_mobility_data$province_code)] <- google_mobility_data$province_code[!is.na(google_mobility_data$province_code)]
google_mobility_data$country[!is.na(google_mobility_data$province_code)] <- google_mobility_data$province[!is.na(google_mobility_data$province_code)]
# rename KZN consistently
google_mobility_data$country_code[google_mobility_data$country_code == "ZA-NL"] <- "ZA-KZN"
# drop province columns
google_mobility_data <- google_mobility_data[, c(-3, -4)]
# divide indexes by 100 to have numbers between 0 and 1
google_mobility_data[, 4:9] <- google_mobility_data[, 4:9]/100
# fix GB code to UK
google_mobility_data$country_code[google_mobility_data$country_code == "GB"] <- "UK"
# make factors
google_mobility_data$country_code <- as.factor(google_mobility_data$country_code)
google_mobility_data$country <- as.factor(google_mobility_data$country)
Further to the above we also calculate an average mobility index per [6]. This index averages the mobility indexes but exclude the following indexes:
google_mobility_data$average_mobility <- (google_mobility_data$retail_and_recreation + google_mobility_data$grocery_and_pharmacy + google_mobility_data$transit_stations +
google_mobility_data$workplaces)/4
To simplify further analysis it’s easier to combine all data in a single dataset:
# Add a 'continent' field to data_za and combine
data_za <- cbind("SA", data_za)
colnames(data_za)[1] <- "continent"
colnames(data_za)[4] <- "country"
data_za$country_code <- paste0("ZA-", data_za$country)
data <- rbind(data, data_za)
# remove all objects except what we need
rm(list = ls()[!ls() %in% c("data", "country_codes", "google_mobility_data")])
Below we plot cumulative case count on a log scale by province:
ggplot(data %>% filter(continent == "SA" & type == "cases" & cumulative_count > 0), aes(x = date, y = cumulative_count)) + geom_line(aes(color = country),
size = 1) + scale_y_log10(labels = comma) + ggtitle("Cumulative Cases by Province") + xlab("Date") + ylab("Cumulative Cases") + theme_bw() + theme(axis.text.x = element_text(angle = 45,
hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) + scale_color_hue(l = 50)
Below we plot the cumulative deaths by province on a log scale:
ggplot(data %>% filter(continent == "SA" & type == "deaths" & cumulative_count > 0), aes(x = date, y = cumulative_count)) + geom_line(aes(color = country),
size = 1) + scale_y_log10(labels = comma) + ggtitle("Cumulative Deaths by Province") + xlab("Date") + ylab("Cumulative Deaths") + theme_bw() + theme(axis.text.x = element_text(angle = 45,
hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) + scale_color_hue(l = 50)
Below we plot average mobility indexes by province:
ggplot(data = google_mobility_data[substr(as.character(google_mobility_data$country_code), 1, 2) == "ZA", ], aes(x = date, y = average_mobility, colour = country)) +
geom_line() + scale_color_hue(l = 50) + theme_bw()
Below we plot cumulative case count on a log scale by country:
ggplot(data %>% filter(continent != "SA" & type == "cases" & country_code %in% country_codes & cumulative_count > 0), aes(x = date, y = cumulative_count)) +
geom_line(aes(color = country), size = 1) + scale_y_log10(labels = comma) + ggtitle("Cumulative Cases by Country") + xlab("Date") + ylab("Cumulative Cases") +
theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) + scale_color_hue(l = 50)
Below we plot the cumulative deaths by country on a log scale:
ggplot(data %>% filter(continent != "SA" & type == "deaths" & country_code %in% country_codes & cumulative_count > 0), aes(x = date, y = cumulative_count)) +
geom_line(aes(color = country), size = 1) + scale_y_log10(labels = comma) + ggtitle("Cumulative Deaths by Country") + xlab("Date") + ylab("Cumulative Deaths") +
theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) + scale_color_hue(l = 50)
Below we plot average mobility indexes by country:
ggplot(data = google_mobility_data %>% filter(country_code %in% country_codes), aes(x = date, y = average_mobility, colour = country)) + geom_line() +
scale_color_hue(l = 50) + theme_bw()
To do further analysis an serial interval assumption is needed. The serial interval is taken from [7]. It’s assumed to be Gamma distributed with mean of 6.48 and standard deviation of 3.83. This corresponds to the effective infectiousness of an individual since acquiring the infection themselves.
We plot this serial distribution below:
si_mean <- 6.48
si_sd <- 3.83
si_cv <- si_sd/si_mean
serial_interval = rep(0, 1)
serial_interval[1] = (pgammaAlt(1.5, si_mean, si_cv) - pgammaAlt(0, si_mean, si_cv))
for (i in 2:50) {
serial_interval[i] = (pgammaAlt(i + 0.5, si_mean, si_cv) - pgammaAlt(i - 0.5, si_mean, si_cv))
}
ggplot(data.frame(days = seq(1, 20, 1), serial_interval = serial_interval[1:20]), aes(y = serial_interval, x = days)) + geom_line(size = 1) + theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) + scale_color_hue(l = 50)
The following code works backwards and fits \(R_{t,m}\) values for whole weeks (ending on the last date in the data) using the EpiEstim package. Using deaths or infection the as the cases in the puts the estimate of \(R_{t,m}\) as at the date of the cases being tracked.
So \(t\) is based on the date of reporting (be that cases or deaths) and \(m\) is the country or province. Two values are estimated for each country:
Note that the time periods are left unadjusted, though in reality the \(R_{t,m}^{deaths}\) should be shifted back approximately 2 weeks relative to \(R_{t,m}^{cases}\).
Rt_data <- NULL
for (c in levels(data$country)) {
for (t in levels(data$type)) {
# filter out country data of type t
c_data <- data %>% filter(country == c & type == t)
# vector of count of cases/deaths
I <- c_data$count
# t=1 corresponds to this date:
t1_date <- min(c_data$date)
# the day after the first case/death:
t_start_date <- min((c_data %>% filter(count > 0))$date) + 1
t_start <- min(seq(1, length(I))[I > 0]) + 1
# last day of cases/deaths
t_end <- length(I)
t_end_date <- t1_date + (t_end - 1)
# how many full weeks do we have
full_weeks <- floor((t_end - t_start)/7)
# only continue if we have 1 full week or more
if (full_weeks > 0) {
# then divide period into weeks
T.Start <- seq(t_end - 7 * full_weeks, t_end - 7, by = 7)
T.End <- T.Start + 7
# estimate $R_{t,m}^{type}$ for each week:
c_Rt <- EstimateR(I, T.Start = T.Start, T.End = T.End, Mean.SI = si_mean, Std.SI = si_sd, method = "ParametricSI")$R
# count cases
c_data$week <- floor(as.numeric(c_data$date - (t1_date + T.Start[1]))/7)
c_data$dd <- c(0, c_data$date[2:nrow(c_data)] - c_data$date[1:(nrow(c_data) - 1)])
c_agg_data <- c_data %>% filter(week >= 0) %>% group_by(week) %>% summarise(count = sum(count)) %>% ungroup()
# add country and type designations
c_Rt <- cbind(c_data$continent[1], c_data$country_code[1], c, t, c_Rt, c_agg_data$count)
# and add dates
c_Rt$date_start <- t1_date + c_Rt$T.Start
c_Rt$date_end <- t1_date + c_Rt$T.End - 1
# combine the results
if (is.null(Rt_data)) {
Rt_data <- c_Rt
} else {
Rt_data <- rbind(Rt_data, c_Rt)
}
}
}
}
Below we prep column headings and prepare CI data:
# Column names
colnames(Rt_data) <- c("continent", "country_code", "country", "type", "t_start", "t_end", "Rt_mean", "Rt_std", "Rt_li_95", "Rt_li_90", "Rt_li_50", "Rt_median",
"Rt_ui_50", "Rt_ui_90", "Rt_ui_95", "count", "date_start", "date_end")
# mid week data point
Rt_data$date_mid <- Rt_data$date_start + (Rt_data$date_end - Rt_data$date_start)/2
# split out CI data into different dataset
Rt_ci_data <- NULL
for (ci in c("50", "90", "95")) {
r <- data.frame(country_code = Rt_data$country_code, country = Rt_data$country, type = Rt_data$type, ci = rep(paste0(ci, "%"), nrow(Rt_data)), date_mid = Rt_data$date_mid,
Rt_mean = Rt_data$Rt_mean, Rt_li = Rt_data[[paste0("Rt_li_", ci)]], Rt_ui = Rt_data[[paste0("Rt_ui_", ci)]])
Rt_ci_data <- rbind(r, Rt_ci_data)
}
# remove what is not needed
rm(list = ls()[!ls() %in% c("data", "country_codes", "Rt_data", "Rt_ci_data", "google_mobility_data")])
Rt_dataLinking the mobility data should be done in a way that corresponds to the time the infection should have occurred not the time of death. Below new date ranges are calculated to correspond roughly to the time of infection:
Rt_data$date_inf_start <- Rt_data$date_start - ifelse(Rt_data$type == "death", 23 + 7, 6 + 7)
Rt_data$date_inf_end <- Rt_data$date_end - ifelse(Rt_data$type == "death", 23 + 7, 6 + 7)
Rt_data$date_inf_mid <- Rt_data$date_mid - ifelse(Rt_data$type == "death", 23 + 7, 6 + 7)
The code below calculates average weekly mobility data averaged over the same date ranges as calculated above.
weekly_mobility_data <- NULL
combined_country_codes <- levels(Rt_data$country_code)[levels(Rt_data$country_code) %in% levels(google_mobility_data$country_code)]
for (c in combined_country_codes) {
for (t in levels(Rt_data$type)) {
c_data <- Rt_data %>% filter(country_code == c & type == t)
if (nrow(c_data) > 0) {
for (i in 1:nrow(c_data)) {
w_data <- google_mobility_data %>% filter(country_code == c & date >= c_data$date_inf_start[i] & date <= c_data$date_inf_end[i]) %>%
group_by(country_code) %>% summarise(retail_and_recreation = mean(retail_and_recreation), grocery_and_pharmacy = mean(grocery_and_pharmacy),
parks = mean(parks), transit_stations = mean(transit_stations), workplaces = mean(workplaces), residential = mean(residential), average_mobility = mean(average_mobility)) %>%
ungroup()
if (!is.na(w_data$country_code)) {
w_data$type <- t
w_data$date_inf_mid <- c_data$date_inf_mid[i]
weekly_mobility_data <- rbind(weekly_mobility_data, w_data)
}
}
}
}
}
Based on work above the weekly mobility data can be joined:
Rt_data <- left_join(Rt_data, weekly_mobility_data, by = c("country_code", "type", "date_inf_mid"))
TO DO
Below current (last weekly) \(R_{t,m}\) estimates are tabulated. To so the data first needs to be prepared:
# find the last estimates
last_dates <- Rt_data %>% group_by(country, type) %>% summarise(date_mid = max(date_mid)) %>% ungroup()
# construct a table with nice fields names
table <- inner_join(last_dates, Rt_data, by = c("country", "type", "date_mid"))
table <- table %>% select(continent, country_code, country, type, count, date_end, Rt_li_95, Rt_mean, Rt_ui_95)
knitr::kable((table %>% filter(continent == "SA"))[, c(-1, -2)], format.args = list(big.mark = ","), digits = 1, col.names = c("Country", "Estimated Type",
"Count (Last Week)", "Week Ending", "R - Lower CI", "R - Mean", "R - Uppper CI"))
| Country | Estimated Type | Count (Last Week) | Week Ending | R - Lower CI | R - Mean | R - Uppper CI |
|---|---|---|---|---|---|---|
| EC | cases | 3,629 | 2020-06-13 | 1.7 | 1.8 | 1.8 |
| EC | deaths | 116 | 2020-06-13 | 2.5 | 3.0 | 3.5 |
| FS | cases | 109 | 2020-06-13 | 1.0 | 1.2 | 1.5 |
| FS | deaths | 0 | 2020-06-13 | 0.0 | 0.7 | 2.6 |
| GP | cases | 4,279 | 2020-06-13 | 2.1 | 2.1 | 2.2 |
| GP | deaths | 34 | 2020-06-13 | 2.0 | 2.8 | 3.7 |
| KZN | cases | 751 | 2020-06-13 | 1.2 | 1.2 | 1.3 |
| KZN | deaths | 3 | 2020-06-13 | 0.6 | 1.2 | 2.0 |
| LP | cases | 93 | 2020-06-13 | 1.2 | 1.5 | 1.9 |
| LP | deaths | 1 | 2020-06-13 | 1.1 | 9.2 | 25.6 |
| MP | cases | 96 | 2020-06-13 | 1.4 | 1.7 | 2.1 |
| NC | cases | 39 | 2020-06-13 | 0.7 | 1.0 | 1.3 |
| NC | deaths | 0 | 2020-06-13 | 0.1 | 4.6 | 17.2 |
| NW | cases | 510 | 2020-06-13 | 1.6 | 1.7 | 1.9 |
| NW | deaths | 4 | 2020-06-13 | 2.7 | 8.2 | 16.7 |
| SA | cases | 19,762 | 2020-06-13 | 1.3 | 1.3 | 1.3 |
| SA | deaths | 471 | 2020-06-13 | 1.3 | 1.4 | 1.5 |
| WC | cases | 10,262 | 2020-06-13 | 1.0 | 1.1 | 1.1 |
| WC | deaths | 312 | 2020-06-13 | 1.0 | 1.1 | 1.2 |
The table below contains \(R_{t,m}\) estimates for the last week available. These estimates are based either on cases or deaths and a 95% confidence interval is shown.
knitr::kable((table %>% filter(continent != "SA" & country_code %in% country_codes))[, c(-1, -2)], format.args = list(big.mark = ","), digits = 1, col.names = c("Country",
"Estimated Type", "Count (Last Week)", "Week Ending", "R - Lower CI", "R - Mean", "R - Uppper CI"))
| Country | Estimated Type | Count (Last Week) | Week Ending | R - Lower CI | R - Mean | R - Uppper CI |
|---|---|---|---|---|---|---|
| Brazil | cases | 177,677 | 2020-06-14 | 1.0 | 1.0 | 1.0 |
| Brazil | deaths | 6,790 | 2020-06-14 | 0.9 | 1.0 | 1.0 |
| Canada | cases | 3,353 | 2020-06-14 | 0.7 | 0.7 | 0.8 |
| Canada | deaths | 334 | 2020-06-14 | 0.5 | 0.6 | 0.7 |
| Chile | cases | 39,610 | 2020-06-14 | 1.2 | 1.2 | 1.2 |
| Chile | deaths | 1,560 | 2020-06-14 | 1.7 | 1.8 | 1.8 |
| India | cases | 74,294 | 2020-06-14 | 1.2 | 1.2 | 1.2 |
| India | deaths | 2,266 | 2020-06-14 | 1.2 | 1.3 | 1.3 |
| Ireland | cases | 112 | 2020-06-14 | 0.4 | 0.5 | 0.6 |
| Ireland | deaths | 35 | 2020-06-14 | 0.7 | 0.9 | 1.3 |
| Italy | cases | 1,850 | 2020-06-14 | 0.8 | 0.8 | 0.8 |
| Italy | deaths | 455 | 2020-06-14 | 0.8 | 0.9 | 0.9 |
| Peru | cases | 33,374 | 2020-06-14 | 0.9 | 0.9 | 0.9 |
| Peru | deaths | 1,197 | 2020-06-14 | 1.1 | 1.2 | 1.3 |
| South_Africa | cases | 19,763 | 2020-06-14 | 1.3 | 1.3 | 1.3 |
| South_Africa | deaths | 471 | 2020-06-14 | 1.3 | 1.4 | 1.5 |
| Spain | cases | 2,295 | 2020-06-13 | 0.9 | 0.9 | 0.9 |
| Spain | deaths | 1 | 2020-06-13 | 0.0 | 0.0 | 0.1 |
| United_Kingdom | cases | 9,507 | 2020-06-14 | 0.8 | 0.8 | 0.8 |
| United_Kingdom | deaths | 1,197 | 2020-06-14 | 0.6 | 0.7 | 0.7 |
Below we show various extremes of \(R_{t,m}\) where counts (deaths or cases) exceed 50 in the last week.
knitr::kable((table %>% filter(continent != "SA" & type == "deaths" & count > 50) %>% arrange(Rt_mean))[1:10, c(-1, -2)], format.args = list(big.mark = ","),
digits = 1, col.names = c("Country", "Estimated Type", "Count (Last Week)", "Week Ending", "R - Lower CI", "R - Mean", "R - Uppper CI"))
| Country | Estimated Type | Count (Last Week) | Week Ending | R - Lower CI | R - Mean | R - Uppper CI |
|---|---|---|---|---|---|---|
| Belgium | deaths | 70 | 2020-06-14 | 0.4 | 0.6 | 0.7 |
| Canada | deaths | 334 | 2020-06-14 | 0.5 | 0.6 | 0.7 |
| France | deaths | 256 | 2020-06-14 | 0.6 | 0.7 | 0.7 |
| United_Kingdom | deaths | 1,197 | 2020-06-14 | 0.6 | 0.7 | 0.7 |
| Germany | deaths | 119 | 2020-06-14 | 0.6 | 0.7 | 0.8 |
| Sweden | deaths | 218 | 2020-06-14 | 0.6 | 0.7 | 0.8 |
| Turkey | deaths | 123 | 2020-06-14 | 0.7 | 0.8 | 0.9 |
| Italy | deaths | 455 | 2020-06-14 | 0.8 | 0.9 | 0.9 |
| United_States_of_America | deaths | 5,634 | 2020-06-14 | 0.9 | 0.9 | 0.9 |
| Mexico | deaths | 3,361 | 2020-06-14 | 0.9 | 0.9 | 1.0 |
knitr::kable((table %>% filter(continent != "SA" & type == "cases" & count > 50) %>% arrange(Rt_mean))[1:10, c(-1, -2)], format.args = list(big.mark = ","),
digits = 1, col.names = c("Country", "Estimated Type", "Count (Last Week)", "Week Ending", "R - Lower CI", "R - Mean", "R - Uppper CI"))
| Country | Estimated Type | Count (Last Week) | Week Ending | R - Lower CI | R - Mean | R - Uppper CI |
|---|---|---|---|---|---|---|
| Sri_Lanka | cases | 70 | 2020-06-14 | 0.3 | 0.3 | 0.4 |
| Djibouti | cases | 280 | 2020-06-14 | 0.3 | 0.3 | 0.4 |
| Malaysia | cases | 142 | 2020-06-14 | 0.3 | 0.4 | 0.4 |
| Ireland | cases | 112 | 2020-06-14 | 0.4 | 0.5 | 0.6 |
| Maldives | cases | 112 | 2020-06-14 | 0.5 | 0.5 | 0.6 |
| Gabon | cases | 362 | 2020-06-14 | 0.5 | 0.6 | 0.7 |
| Uganda | cases | 101 | 2020-06-14 | 0.6 | 0.7 | 0.8 |
| Tajikistan | cases | 518 | 2020-06-14 | 0.6 | 0.7 | 0.7 |
| Canada | cases | 3,353 | 2020-06-14 | 0.7 | 0.7 | 0.8 |
| Malawi | cases | 120 | 2020-06-14 | 0.6 | 0.8 | 0.9 |
knitr::kable((table %>% filter(continent != "SA" & type == "deaths" & count > 50) %>% arrange(-Rt_mean))[1:10, c(-1, -2)], format.args = list(big.mark = ","),
digits = 1, col.names = c("Country", "Estimated Type", "Count (Last Week)", "Week Ending", "R - Lower CI", "R - Mean", "R - Uppper CI"))
| Country | Estimated Type | Count (Last Week) | Week Ending | R - Lower CI | R - Mean | R - Uppper CI |
|---|---|---|---|---|---|---|
| Cameroon | deaths | 72 | 2020-06-14 | 2.8 | 3.6 | 4.5 |
| Iraq | deaths | 231 | 2020-06-14 | 1.6 | 1.9 | 2.1 |
| Chile | deaths | 1,560 | 2020-06-14 | 1.7 | 1.8 | 1.8 |
| Afghanistan | deaths | 124 | 2020-06-14 | 1.2 | 1.4 | 1.7 |
| Moldova | deaths | 67 | 2020-06-14 | 1.1 | 1.4 | 1.7 |
| South_Africa | deaths | 471 | 2020-06-14 | 1.3 | 1.4 | 1.5 |
| Philippines | deaths | 80 | 2020-06-14 | 1.1 | 1.4 | 1.7 |
| Argentina | deaths | 167 | 2020-06-14 | 1.1 | 1.3 | 1.5 |
| Saudi_Arabia | deaths | 256 | 2020-06-14 | 1.2 | 1.3 | 1.5 |
| Colombia | deaths | 387 | 2020-06-14 | 1.2 | 1.3 | 1.4 |
knitr::kable((table %>% filter(continent != "SA" & type == "cases" & count > 50) %>% arrange(-Rt_mean))[1:10, c(-1, -2)], format.args = list(big.mark = ","),
digits = 1, col.names = c("Country", "Estimated Type", "Count (Last Week)", "Week Ending", "R - Lower CI", "R - Mean", "R - Uppper CI"))
| Country | Estimated Type | Count (Last Week) | Week Ending | R - Lower CI | R - Mean | R - Uppper CI |
|---|---|---|---|---|---|---|
| Angola | cases | 52 | 2020-06-14 | 2.5 | 3.3 | 4.3 |
| Eswatini | cases | 164 | 2020-06-14 | 2.3 | 2.7 | 3.1 |
| Benin | cases | 144 | 2020-06-14 | 2.2 | 2.6 | 3.0 |
| Bulgaria | cases | 555 | 2020-06-14 | 2.0 | 2.2 | 2.4 |
| Kosovo | cases | 295 | 2020-06-14 | 1.9 | 2.1 | 2.4 |
| Jordan | cases | 158 | 2020-06-14 | 1.8 | 2.1 | 2.4 |
| China | cases | 102 | 2020-06-14 | 1.7 | 2.0 | 2.4 |
| Greece | cases | 160 | 2020-06-14 | 1.7 | 1.9 | 2.2 |
| Zambia | cases | 203 | 2020-06-14 | 1.7 | 1.9 | 2.1 |
| Sao_Tome_and_Principe | cases | 146 | 2020-06-14 | 1.5 | 1.8 | 2.1 |
Let’s generate our plots for each country/province in a list. We filter out weeks where the upper end of confidence interval for $R_{t,m} exceeds five.
country_plots <- list()
for (c in levels(Rt_data$country)) {
p1 <- ggplot(Rt_ci_data %>% filter(country == c & ci == "95%" & Rt_ui <= 5), aes(x = date_mid, y = Rt_mean)) + geom_crossbar(position = position_dodge2(padding = 0),
aes(ymin = Rt_li, ymax = Rt_ui, colour = type, fill = type), width = 7) + scale_fill_manual(values = c(alpha("deepskyblue4", 0.45), alpha("#8b0068",
0.45))) + scale_colour_manual(values = c(alpha("deepskyblue4"), alpha("#8b0068"))) + scale_y_continuous(limits = c(0, 5)) + scale_x_date(date_breaks = "1 months",
labels = date_format("%e %b")) + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) +
xlab("Week") + ylab(expression(R[t, m])) + ggtitle(c) + geom_hline(yintercept = 1)
country_plots[[c]] <- p1
}
Below we plot all the provinces:
ggarrange(country_plots[["EC"]], country_plots[["FS"]], country_plots[["GP"]], country_plots[["KZN"]], country_plots[["LP"]], country_plots[["MP"]],
country_plots[["NC"]], country_plots[["NW"]], country_plots[["WC"]], country_plots[["SA"]], ncol = 2, nrow = 5) + ggtitle("Reproductive number by province, with 95% confidence intervals")
Below we plot all the countries:
ggarrange(country_plots[["Brazil"]], country_plots[["Canada"]], country_plots[["Chile"]], country_plots[["India"]], country_plots[["Ireland"]], country_plots[["Italy"]],
country_plots[["Peru"]], country_plots[["South_Africa"]], country_plots[["Spain"]], country_plots[["United_Kingdom"]], ncol = 2, nrow = 5) + ggtitle("Reproductive number by country, with 95% confidence intervals")
Below we plot estimates of \(R_{t,m}\) based on cases versus those based on deaths for countries where these estimates range between 0.5 and 2.
Below we plot where there were more than 5 deaths and cases in the last week. It appears that, generally, \(R_{t,m}^{cases}\) and \(R_{t,m}^{deaths}\) are consistent.
sp_data <- pivot_wider(table %>% filter(count >= 5 & continent == "SA"), id_cols = c("continent", "country"), names_from = "type", values_from = "Rt_mean")
ggplot(sp_data, aes(x = cases, y = deaths)) + geom_point(aes(label = country)) + geom_text(aes(label = country), hjust = 0, vjust = 0, size = 2) + ggtitle("Reproductive number by country") +
xlab(expression(R[t, m]^cases)) + ylab(expression(R[t, m]^deaths)) + scale_y_continuous(limits = c(0.5, 2)) + scale_x_continuous(limits = c(0.5,
2)) + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) +
scale_color_hue(l = 50) + geom_abline(a = 0, b = 1)
Below we plot where there were more than 25 deaths and cases in the last week. It appears that, generally, \(R_{t,m}^{cases}\) and \(R_{t,m}^{deaths}\) are consistent.
sp_data <- pivot_wider(table %>% filter(count >= 25 & continent != "SA"), id_cols = c("continent", "country"), names_from = "type", values_from = "Rt_mean")
ggplot(sp_data, aes(x = cases, y = deaths)) + geom_point(aes(color = continent, label = country)) + geom_text(aes(label = country), hjust = 0, vjust = 0,
size = 2) + ggtitle("Reproductive number by country") + xlab(expression(R[t, m]^cases)) + ylab(expression(R[t, m]^deaths)) + scale_y_continuous(limits = c(0.5,
2)) + scale_x_continuous(limits = c(0.5, 2)) + theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right") +
guides(fill = guide_legend(ncol = 1)) + scale_color_hue(l = 50) + geom_abline(a = 0, b = 1)
Below we plot weeks of estimates of \(R_{t,m}\) versus the average mobility index that was constructed. There is a clear link between mobility and the reproductive number. As it decreases \(R_{t,m}\) reduces:
ggplot(Rt_data %>% filter(type == "cases" & count > 100 & country_code %in% c(country_codes)), aes(x = average_mobility, y = Rt_mean)) + geom_point(aes(colour = country)) +
ggtitle("Reproductive number by mobility") + ylab(expression(R[t, m]^cases)) + # scale_y_continuous(limits = c(0, 10)) +
theme_bw() + theme(axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "right") + guides(fill = guide_legend(ncol = 1)) + scale_color_hue(l = 50)
TO DO
From the basic plots it’s clear that South Africa and a few other countries appear to be on a different “slope” than European countries show. These include Brazil, Peru, Chile and India.
The above shows estimates for reproductive number using cases and deaths (\(R_{t,m}^{cases}\) and \(R_{t,m}^{deaths}\)) for each country \(m\) over time \(t\) in weeks.
From the current estimates it appears that, at present, in general South African provincial estimates based on cases and deaths correspond. An exception to this may be Eastern Cape with estimates based on deaths exceeding those based on cases.
It is also clear that estimates based on cases appear to precede the movements based on deaths over time. This can be expected as, all things being equal, infections and associated cases should be a precursor to deaths.
South Africa does not compare well, and appears to have one of the highest \(R_{t,m}^{deaths}\) in the world. This is higher than the \(R_{t,m}^{cases}\) which may be indicative of lower testing and/or backlogs with regard to testing.
On a scatter plot of countries most appear to have \(R_{t,m}^{cases}\) correlated well with \(R_{t,m}^{deaths}\). Chile, Afghanistan and Iran appear to be outliers with $R_{t,m}^{deaths} higher than \(R_{t,m}^{cases}\) indicating perhaps limited testing, or alternatively epidemics that are slowing (given the leading nature of cases vs. deaths).
Overall it’s clear that having multiple measures of \(R_{t,m}\) appears useful and monitoring it’s value is something useful.
Detailed output for all countries are saved to a comma-separated value file below. The file can be found here.
write.csv(Rt_data, "Rt_data.csv", row.names = FALSE)
[1] A. Cori, N. M. Ferguson, C. Fraser, and S. Cauchemez, “A new framework and software to estimate time-varying reproduction numbers during epidemics,” American Journal of Epidemiology, vol. 178, no. 9, pp. 1505–1512, Sep. 2013, doi: 10.1093/aje/kwt133. [Online]. Available: https://doi.org/10.1093/aje/kwt133
[2] A. Cori, EpiEstim: EpiEstim: A package to estimate time varying reproduction numbers from epidemic curves. 2013 [Online]. Available: https://CRAN.R-project.org/package=EpiEstim
[3] V. Marivate et al., “Coronavirus disease (COVID-19) case data - South Africa.” Zenodo, 2020 [Online]. Available: https://zenodo.org/record/3888499
[4] European Centre for Disease Prevention and Control, “Data on the geographic distribution of COVID-19 cases worldwide.” European Union, 2020 [Online]. Available: https://www.ecdc.europa.eu/en/publications-data/download-todays-data-geographic-distribution-covid-19-cases-worldwide
[5] Google LLC, “Google COVID-19 community mobility reports.” 2020 [Online]. Available: https://www.google.com/covid19/mobility/
[6] P. Nouvellet et al., “Report 26: Reduction in mobility and covid-19 transmission,” Imperial College London, 2020 [Online]. Available: http://spiral.imperial.ac.uk/handle/10044/1/79643
[7] N. Ferguson et al., “Report 9: Impact of non-pharmaceutical interventions (NPIs) to reduce COVID19 mortality and healthcare demand,” Imperial College London, 2020 [Online]. Available: https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-9-impact-of-npis-on-covid-19/